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Abstract — The notion of source polarization is introduced and 
investigated. This complements the earlier work on channel polar- 
ization. An application to Slepian-Wolf coding is also considered. 
The paper is restricted to the case of binary alphabets. Extension 
of results to non-binary alphabets is discussed briefly. 

Index Terms — Polar codes, source polarization, channel polar- 
ization, source coding, Slepian-Wolf coding. 

I. Introduction 

We introduce the notion of "source polarization" which 
complements "channel polarization" that was studied in (TJ. 
One immediate application of source polarization is the design 
of polar codes for lossless source coding. Lossless source 
coding using polar codes has already been considered ex- 
tensively in the pioneering works [2] and [3], which reduced 
this problem to one of channel polarization using the duality 
between the two problems. The approach in this paper is direct 
and offers an alternative (primal) viewpoint. 

This paper is restricted mostly to binary memoryless 
sources. We indicate in the end briefly the possible gener- 
alizations to non-binary sources. 

We use the notation of [1]. In particular, we write u N to 
denote a vector (ux, . . . , wjv) and u\ to denote the sub-vector 
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,Uj) for any 1 < i < j < N. If j < 



null vector. The logarithm is to the base 2 unless otherwise 
indicated. We write X ~ Ber(p) to denote a Bernoulli 
random variable (RV) with values in {0, 1} and -Px(l) = P- 
The entropy H(X) of such a RV is denoted sometimes as 
H(p) = -plogp - (1 - p) log(l - p). 

II. Polarization of binary memoryless sources 

WITH SIDE INFORMATION 

Let (X, Y) ~ Px,y be an arbitrary pair of random variables 
over X x y with X = {0, 1} and y an arbitrary countable set. 
Throughout this section, we regard (X, Y) as a memoryless 
source, with X as the part to be compressed and Y in the 
role of "side-information" about X. We consider a sequence 
{(Xi, Yi)}°l 1 of independent drawings from (X, Y) and write 
(X N , Y N ) to denote the first N elements of this sequence, for 
any integer N > 1. 

U x = X x ® X 2 



X 2 



The basic idea of source polarization is contained in the 
transformation shown in Fig. [TJ where "©" denotes addition 
mod-2. The operation (Xx,X 2 ) — > (Ux, U 2 ) performed by the 
circuit preserves entropy, i.e., 

H(Ux,U 2 \Yx,Y 2 ) = H(X 1 ,X 2 \Y 1 ,Y 2 ) 

= 2H(X\Y), (1) 
but is polarizing in the sense that 

H(Ux\Yx,Y 2 )>H(X\Y)>H(U 2 \Yx,Y 2 ,U x ). (2) 

It is easy to show that equalities hold here if and only if 
H(X\Y) equals or 1. Thus, unless the entropies at the input 
of the circuit are already perfectly polarized, the entropies at 
the output will polarize further. 
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Fig. 2. Four-by-four source transformation. 

Figure [2] shows the recursive continuation of the construc- 
tion to the case where four independent copies of (X, Y) are 
processed. The entropy conservation law states that 



H(U 4 \Y 4 ) 



= H(X 4 \Y 4 ) = AH(X\Y). 



Fig. 1. Basic source transformation. 



Using the chain rule, we may split the output entropy as 

4 

H{U 4 \Y 4 ) = HiU^Y 4 , U 1 - 1 ). 
i=i 

Note that the variables U A are assigned to the output terminals 
of the circuit in Fig. [2] in a shuffled order. This is motivated 
by the observation that, with this ordering, the pair (Ux,U 2 ) 
is obtained from two i.i.d. RVs, namely, (Si, S 2 ), by the same 
two-by-two construction as in Fig.Q] A similar remark applies 
to the relationship between (U^^Ui) and (Rx,R 2 )- These 
observations lead to the the following inequalities, which are 
special cases of those in (O. 

H(Ux\Y 4 ) > HiSxlY?) 

= H(S 2 \Y 3 i )>H(U 2 \Y\U 1 ), 



= H(R 2 \Y 3 \S 2 )>H(U i \Y\U 3 ). 

There is no general inequality between H(U 2 \Y 4 , U 1 ) and 
H (U3 \Y 4 , U 2 ). The conclusion to be drawn is that polarization 
is enhanced further by repeating the basic construction. 

For any N = 2", n > 1, the general form of the source 
polarization transformation is defined algebraically as 



G 



D 



N 



(3) 



where denotes the nth Kronecker power and B x is the 
"bit-reversal" permutation (see 0]). It is easy to check that 
the transforms in Figures Q] and [2] conform to U — X N Gn- 
The main result on source polarization for binary alphabets is 
the following. 

Theorem 1. Let (X, Y) be a source as above. For any N = 
2", n > 1, let U N = X N G N . Then, for any 5 G (0, 1), as 

N -)■ oo, 



N 



H(X\Y) 



and 



\{ie[\,N]-H{Ui\Y N ,U i - x )e[Q,S)}\ ^ i H{xlY) 

We omit the full proof but sketch the idea, which follows 
the proof of the channel polarization result in [ 1 1. The first step 
is to define a tree random process for tracking the evolution 
of the conditional entropy terms {H(Ui\Y N , U 1 ^ 1 )}. The 
analysis is aided by an accompanying supermartingale based 
on the source Bhattacharyya parameters. For the basic source 
(X, Y) ~ Px,y, this parameter is defined as 



Z(X\Y) = 2 ^2Pr{y)y/P x \Y(0\y)Px\Y(l\v). 
y 

The source Bhattacharyya parameters satisfy the following as 
they undergo the two-by-two polarization transformation. 

Proposition 1. Let (X, Y) be a source as above, and (X\, Y\) 
and (X 2 ,Y 2 ) two independent drawings from (X, Y). Then, 



and 



Z(X 1 ®X 2 \Y 2 ) < 2Z(X\Y) - Z{X\Yf 



Z{X 2 \Y 2 ,X 1 ®X 2 ) = Z{X\Yf 



We omit the proof of this result since it is very similar to 
the proof of a similar inequality on channel Bhattacharyya 
parameters given in [1]. Thus, we have the inequality 

Z(Ui\Y 2 ) + Z(U 2 \Y 2 ,U l ) < 2Z(X\Y) 

which is the basis of the Bhattacharyya supermartingale. Con- 
vergence results about the Bhattacharyya supermartingale may 
be translated into similar results for the entropy martingale 
through the following pair of inequalities. 



Proposition 2. For (X, Y) a source as above, the following 
inequalities hold 



Z(X\Y) 2 < H{X\Y) 
H(X\Y) <log(l + Z(X\Y)). 



(4) 
(5) 



Either both inequalities are strict or both hold with equality. 
For equality to hold, it is necessary and sufficient that X 
conditioned on Y is either deterministic or Ber(^). 

The proof is given in the appendix. 

These inequalities serve the purpose of showing that 
H(X\Y) is near or 1 if and only if Z(X\Y) is near or 
1, respectively. Hence, the parameters {H(Ui\Y N , U l ~ 1 )}fL 1 
and {Z(Ui\Y N ,U l ~ 1 )}fL 1 polarize simultaneously. 

For coding theorems, it is important to have a rate of 
convergence result. 

Definition 1. Let (X, Y) be a source as above, and let 
R > 0. For N = 2™, n > 1, let E x \y(N,R) denote a 
subset of {1, ... ,N} such that \E x \y{N, R)\ = \NR] and 
Z{U t \Y N ,U l - v ) < Z{U j \Y N ,U j - 1 )foralli e E X]Y {N,R) 
and j ^ E X \ Y {N,R). We refer to E X \ Y (N, R) as a "high- 
entropy" (index) set of rate R and block-length N. For the 
special case where Y is absent or unavailable, we write 
E x (N, R) to denote the high-entropy set of X only. When 
N and R are clear from the context, we simplify the notation 
by writing E X \ Y or E x . 

Theorem 2. Let (X, Y) be a source as above and R > 
H(X\Y) be fixed. Consider a sequence of high-entropy sets 
{E x \y(N, R) : N = 2™, n > 1}. For any such sequence, any 
fixed (3 < i, and asymptotically in N, we have 

Z(U l \Y N ^U 1 - 1 ) = 0(2~ Nf> 
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We omit the proof, which is covered by the results of ||4). 

III. Lossless source coding 

Let (X, Y) be a source as in the previous section and 
(X N , Y N ) denote an output block of length N > 1 produced 
by this source. Shannon's lossless source coding theorem states 
that an encoder can compress (X , Y N ) into a codeword of 
length roughly NH(X\Y) bits so that a decoder observing 
the codeword and Y N can recover X N reliably, provided 
N is sufficiently large. We now describe a method based 
on polarization that achieves this compression bound. In the 
absence of any side information Y , the method given here is 
algorithmic ally identical to the source coding method proposed 
in and 0; however, our viewpoint is different. Instead 
of reducing the source coding problem to a channel coding 
problem by exploiting a duality relationship between the two 
problems, we use direct arguments based solely on source 
polarization. 

Fix N = 2" for some n > 1. Fix R > H(X\Y) and a 
high-entropy set E X \ Y — E X \ Y (N,R). 

Encoding: Given a realization X N = x N , compute u N — 
x N G x and output as the compressed word. (Note that 



the encoder does not require knowledge of the realization of 
Y N to implement this scheme.) 

Decoding: Having received ue xiy and observed the real- 
ization Y N = y N , the decoder sequentially builds an estimate 
ii N of u N by the rule 



Ui if i £ E x \y 

J X\Y 



u t = { if i G E C X , Y and L%\y N , ff- 1 ) > 1 



1 else 



where 



Pr(Z7i = 0\Y" = y N , C/ 1 " 1 = m 4 " 1 ) 
Pr(C/ 4 = ljY^ = y N , W- 1 = u 1 - 1 ) 



is a likelihood ratio, which can be computed recursively using 
the formulas: 

_ L» 2 (^/>^- 2 © uf-*)L% 2 {y» /2+v u?- 2 ) + 1 
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V^ffi^) + Z^ /2 (<, 
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and 
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N/2WN/2+V U e* 2 ) 



'N/2^ ' M o 

where u 2 *~ 2 and w 2 *~ 2 denote, respectively, the parts of m 2i_2 
with odd and even indices, and Si equals 1 or -1 according to 
U2i_i being or 1, respectively. Having constructed tt , the 
decoder outputs x N = u^G^ 1 as the estimate of 1". (It is 
easy to verify that G^ 1 — Gn-) 

Performance: The performance of the decoder is measured 
by the probability of error 

P e = Pr (f7 N ^ U N ) = Pr(t^c [y ^ U EhY ), 

which can be upper-bounded by standard (union-bound) tech- 
niques as 

Z(Ui\Y N iV'- 1 ) 



P e < 



E 



(7) 

ieE x{y (N,R) 

The following is a simple corollary to Theorem [2] and (0. 

Theorem 3. For any fixed R > H(X\Y) and (3 < \, the 
probability of error for the above polar source coding method 
is bounded as P e = 0(2~ N ). 

Complexity: The complexity of encoding and that of decod- 
ing are both 0(N\ogN). 

IV. Application to channel coding: Duality 

The above source coding scheme can be used to design 
a capacity-achieving code for any binary-input memoryless 
channel. Let such a channel be defined by the transition 
probabilities W(y\x), x 6 X = {0, 1} and y £ y. Consider 
the block coding scheme shown in Fig. [3] where signals flow 
from right to left. Here, N — 2", n > 1, is the code block 
length; U N denotes the message vector, X N — U N Gn the 
channel input vector, and Y N the channel output vector. Due 
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Fig. 3. 


Channel coding. 
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to memorylessness, W N (y N \x N ' 

x N e X N , y N e y N . 

We turn the triple (U N , X N ,Y N ) into a joint ensemble 
of random vectors by assigning the probabilities Pr(X N = 
x N ) = 2~ N for all x N e {0, 1} N . Under this assignment, 
(X N ,Y N ) may be regarded as independent samples from 
a source (X,Y) ~ Q(x)W(y\x) where Q is the uniform 
distribution on {0,1}. We let I(W) = I(X;Y) denote the 
symmetric channel capacity and fix B < I(W). This implies 
that 1 - R > H{X\Y). Let E X \y = E X \ Y (N, 1-R) denote 
a high-entropy set of rate (1 — R) for the source (X, Y). The 
following coding scheme achieves reliable communication at 
rate R over the channel W. 

Encoding: Prepare a binary source vector U N as follows. 
Pick the pattern Ue x]y at random from the uniform distri- 
bution and make it available to the decoder ahead of the 
session. In each round, fill Ue c xy with uniformly chosen data 
bits. (Thus, \NR\ bits are sent in each round, for a data 
transmission rate of roughly R.) Encode U into a channel 
codeword by computing X N = U N Gn and transmit X N over 
the channel W. 

Decoding: Having received Y N , use the source decoder of 
the previous section to produce an estimate Ue x]y of the data 
bits Ue c 

Analysis: The error probability Pi(Ue xiy ^ Ue x]y ) is 

bounded as 0(2"^) for any fixed < \ since the source 
coding rate is 1—R > H(X\Y). The complexity of the scheme 
is bounded as 0(N TogiV). 

Remark. The above argument reduces the channel coding 
problem for achieving the symmetric capacity I(W) of a 
binary-input channel W to a source coding problem for a 
source (X,Y) ~ QW where Q is uniform on {0,1}. This 
reduction exploits the duality of the two problems. This dual 
approach provides an alternative proof of the channel coding 
results of It also complements the duality arguments in 
1121 and [3), where the source coding problem for a Ber(p) 
source was reduced to a channel coding problem for a binary 
symmetric channel with cross-over probability p. 

V. Slepian-Wolf Coding 

The above source coding method can be easily extended 
to the Slepian-Wolf setting [5|. Suppose {(X^Yi)}^ are 
independent samples from a source (X, Y) where both X 
and Y are binary RVs. In the Slepian-Wolf scenario, there are 
two encoders and one decoder. Fix a block-length N = 2", 
n > 1, and rates R x and R y for the two encoders. Encoder 1 
observes X N only and maps it to an integer i x £ [1, 2 NRx ], 
encoder 2 observes Y N only and maps it to an integer 
i y G [1,2 »]. The decoder in the system observes (i x ,i y ) 



and tries to recover (X N ,Y N ) with vanishing probability of 
error. The well-known Slepian-Wolf theorem states that this 
is possible provided R x > H(X\Y), R y > H(Y\X), and 
R x + Ry >H(X,Y). 

It is straightforward to design a polar coding scheme that 
achieves the corner point (H(X\Y),H(Y)) of the Slepian- 
Wolf rate region. Fix R y > H(Y) and R x > H(X\Y). For 
N = 2 n , n > 1, consider a pair of high-entropy sets Ey = 
E Y {N, R y ) and E X]Y = E X{Y (N, R x ). 

encoder 1 



Encoding: Given a realization X N — 



„N 



calculates u N — x n Gk and sends ur viv to the common 

-a I y 

decoder. Given a realization Y N = y N , encoder 2 calculates 
v = u n Gn and sends ve y - 

Decoding: The decoder first applies the decoding algorithm 
of Section to obtain an estimate y N of y N from ve y ■ Next, 
the decoder applies the same algorithm to obtain an estimate 
of x N using y N (as a substitute for the actual realization y N ) 
and ue x{y - 

We omit the analysis of this scheme since it essentially 
consists of two single-user source coding schemes of the type 
treated in Section HIH 

It is clear that polar coding can achieve all points of the 
Slepian-Wolf region by time-sharing between the corner points 
(H(X), H{X\Y)) and (H(X\Y), H(Y)). 

We should remark that polar coding for Slepian-Wolf prob- 
lem was first studied in [6|, 0, and [3] under the assumptions 
that X, Y ~ Ber(i), and X © Y ~ Ber(p). 

The above approach to Slepian-Wolf coding reduces the 
problem to single-user source coding problems. A direct 
appoach would be to have each encoder apply polar transforms 
locally, with encoder 1 computing U N = X n Gn and encoder 
2 computing V N = Y n Gn- Preliminary analyses show 
that such local operations polarize X^ and Y^ not only 
individually but also in a joint sense. A detailed study of such 
schemes is left for future work. 



VI. Polarization of non-binary memoryless 
sources 

Theorem 4. Let X ~ Px be a memoryless source over X = 
{0, 1, . . . , q — 1} for some prime q > 2. For n > 1 and N = 
2 n , let X N — (Xi, . . . , Xn) be N independent drawings from 
the source X. Let U N = X N Gm where Gm is as defined in 
(|3j but the matrix operation is now carried out in GF( q). Then, 
the polarization limits in Theorem\l\remain valid provided the 
entropy terms are calculated with respect to base-q logarithms. 

If q is not prime, the theorem may fail. Consider X 
over {0,1,2,3} with P x (0) = P x {2) = \. Then, it is 
straightforward to check that U N has the same distribution 
as X N for all N . On closer inspection, we realize that X is 
actually a binary source under disguise. More precisely, X is 
already polarized over {0, 2}, which is a subfield of GF(A), 
and vectors over this subfield are closed under multiplication 
by G N . 

The preceding example illustrates the difficulties in mak- 
ing a general statement regarding source polarization over 



arbitrary alphabets. If we introduce some randomness into 
the construction as in [7|, it is possible to polarize sources 
over arbitrary alphabets, still maintaining the 0(N log N) 
complexity of the construction. 
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VII. Appendix 

A. Proof of Inequality © 

First we prove that Z(X) 2 < H(X) for any X ~ Ber(p) 
with equality if and only if p € {0, |, 1}. Let F(p) = H(Z) - 
Z(X) 2 = -plog 2 (p) - (1-p) log 2 (l —p)— 4p(l —p), and 
compute 



dF 1 

^ = hT2 hlnp 



ln(l-p)]-4- 



d 2 F 
dp 2 

d 3 F 
dp 3 



1 
ln2 

1 

= ln~2 



1 

P 



1 



1-pJ 
1 



+ 8, 



Inspection of the third order derivative shows that dF/dp is 
strictly convex for p g [0, h) and strictly concave for p G 
(5,1]. Thus, dF/dp — can have at most one solution in 
each interval [0, 5) and (i, 1]. Since dF/dp — at p = |, the 
number of zeros of dF/dp over [0, 1] is at most three. Thus, 
F(p) can have at most three zeros over [0,1]. Since F(p) =0 
for p £ {0, |, 1}, there can be no other zeros. 

Thus, for any pair of random variables (X, Y) with X 
binary, if we condition on Y = y, we have 

Z(X\Y = yf <H(X\Y = y). 

Averaging over Y, and by Jensen's inequality, we obtain ©. 

B. Proof of Inequality (|5]l 

Recall that the Renyi entropy of order a (a > 0, a ^ 1) 
for a RV X is defined as 



H a (X) = -!—\ogY / Px(3 

1 — a 



and has the following properties JSJ. 

• H a (X) is strictly decreasing in a unless P x is uniform 

on its support Supp(X) = {x : P x (x) > 0}. 
. H{X) = lim a ^ 1 H a (X). 
Now suppose X ~ Ber(p) and note that 

-1 2 



ff 4 P0 = ]og 



\og(l + Z(X)). 



Thus, we have 



H(X)<H i (X) = log(l + Z(X)). 

It follows that, for any jointly distributed pair (X, Y) with 
X binary and any sample value Y = y 

H(X\Y = y) < log(l + Z(X\Y = y)). 

Averaging over Y and by Jensen's inequality, we obtain (0. 
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